Informal Bayesian Inference

The topic of this post is what King et. al (2000) call “statistical simulation,” and Gelman and Hill (2007) call “informal Bayesian inference.”

The idea is to explore the results of classical methods by pushing data through different components of fitted models. Interestingly, King et. al (2000) make comparisons to the bootstrap, while Gelman and Hill (2007) consider it analogous to Full Bayes with non-informative priors. I’ve used this approach a surprising amount in my work. It can provide a framework for interpretations that are descriptively rich as well as simple to implement and explain. Particularly, it provides a way to make contrasts and other linear combinations that are otherwise difficult.

Typically, the situation where I’ve reached for information Bayes is one where I used a classical method for expediency: too much data and/or too many parameters for my equipment, however, have a less conventional question or am not satisfied with the coefficient report. Generally, this approach has allowed me to quantify heterogeneity in fixed effects and gain intuition about the uncertainty in predictions.

For this post, I’m going to focus on using informal Bayes to find the difference between parameters in linear models. The example I focus on here is simple, however, extends easily to more complex models. In practice, I’ve used this approach to compare parameters on different levels of hierarchical models, as well as assess which intervention, among several which are implemented simultaneously, is the most advantageous for additional investment.

The first thing to do is build a function to make it easier to draw posterior samples from common models. The function below takes three inputs, a fitted model (mod), the number of posterior samples (bsamps), and a logical indicator which tells the function if a “robust” variance-covariance matrix is desired (robust). Fundamentally, the function retrieves or calculates the variance-covariance matrix from the fitted model, then samples from a multivariate normal (using MASS::mvnorm for simplicity) with mean vector specified as the point estimates and variance-covariance matrix from the fitted model¹.

Code

infbayes <- function(mod, robust = TRUE, bsamps = 1e4) {
  # check if arguments are resasonable
  if (is.numeric(bsamps) == FALSE) {stop("bsamps argument must be type numeric")}
  if (is.logical(robust) == FALSE) {stop("robust must be type logical")}
  # if a general/generalized linear model is V(B) robust?
  if (class(mod)[1] %in% c("lm","glm")) {
  if (robust == TRUE) {
    vcov_mat <- sandwich::vcovHAC(mod, type = "HC")
  } else {
    vcov_mat <- vcov(mod)  
  }
    # take samples
    means_vec <- coef(mod)
    srep <- as.data.frame(MASS::mvrnorm(n = bsamps, mu = means_vec, Sigma = vcov_mat))
    names(srep) <- names(mod$coefficients)
    return(srep)
  }
  # if it includes random effects, is V(B) robust? this will throw an error for multiple iid random effects
  if (class(mod)[1] %in% c("glmerMod","lmerMod")) {
    if (robust == TRUE) {
      vcov_mat <- clubSandwich::cv(mod, type = "CR1")
    }
    means_vec <- coef(mod)
    srep <- as.data.frame(MASS::mvrnorm(n = bsamps, mu = means_vec, Sigma = vcov_mat))
    names(srep) <- names(mod$coefficients)
    return(srep)
  }
  else {stop("model type not supported, must be of class lm, glm, or lmerMod")}
}

A Simple Application

Having built a function to sample the posterior of regression parameters, lets see how this would play out in practice. Consider the following simplified situation where we have two important variables representing factors that exist simultaneously. For example, these may be the amount of resources allocated to two departments, or two types of amenities you offer patrons. The simultaneously deployment is important for the ceteris paribus interpretation, as well as for the setup, otherwise the model could be constructed as a choice between two options. So, say we wish to know which amenity to build out further, or which department to would benefit the company more if given additional resources. It’s a slightly unconventional question, and staring at the estimates of a linear model won’t give you any insight as the results aren’t calibrated for such a comparison. Rather, to prepare for such an analysis, we might take the approach of an informal power analysis to estimate the required sample (e.g., number of locations, employees, or whatever the unit happens to be).

The following function takes a vector (bx, the fixed, but unknown conditional mean parameters), a variance-covariance matrix (sigma), a constant (const), as well as number of samples (n), and produces a linear model from the results. In the example I’ve kept it to three covariates including our two focal variables, without loss of generality.

Code

simple_sim <- function(n = 1e3, bx = c(2,6,4), const = 5,
                       mu = c(10, 3, 4), 
                       sigma = matrix(
                               data = c(1.11, 0.66, -0.54,  
                                        0.66,  0.84,  0.43, 
                                        -0.54,  0.43,  2.00), nrow = 3)
                       ) {
  # check mu and sigma
  if (length(mu) != dim(sigma)[1]) {stop("mu and sigma are noncomforable")}
  if (!is.matrix(sigma) | (nrow(sigma) == ncol(sigma)) == FALSE) {stop("sigma either not a matrix or not a square matrix")}
  # create data
  dft <- MASS::mvrnorm(n, mu = mu, Sigma = sigma)
  yy <- as.vector(rnorm(n, const, const) + dft %*% bx + rnorm(n))
  dft <- as.data.frame(dft)
  names(dft) <- letters[1:length(names(dft))]
  names(dft)[1:2] <- c("x1", "z1")
  dft <- cbind(yy, dft)
  # fit model
  lmt <- lm(yy ~ ., data = dft)
  return(lmt)
}

Next, I generate some candidate sample sizes, ranging from 100 to 1000, incremented by 100. Typically, smaller samples are less resource intensive, so I’d like as small a sample as possible without sacrificing the precision of the estimated difference. Then, I apply the function to sample the posterior, the first five rows of which are shown below, where the samples column represents the sample size used in them model.

Code

set.seed(89)
n_rep <- seq(0,1e3, by = 100)[-1]
out <- list()
 for (i in seq_along(n_rep) ) {
    out[[i]] <- simple_sim(n = n_rep[[i]])
 }
dfout <- lapply(out, infbayes)
for (i in seq_along(dfout)) {
  dfout[[i]]$samples <- n_rep[[i]]
}
dfout <- do.call(rbind.data.frame, dfout)
head(dfout,5)

  (Intercept)       x1       z1        c samples
1   -37.27578 6.713368 1.169126 6.607417     100
2   -22.99647 5.028081 3.205434 5.264976     100
3   -37.89262 6.541961 1.999250 6.241230     100
4   -14.29463 3.982110 4.065508 5.451617     100
5   -29.11941 6.038991 1.042586 5.972961     100

Having conducted informal Bayes, the results can be easily plotted, as in the figure below. The figure gives us a feel for the implications of the sample size for the difference in conditional mean parameters in a linear model. The results show that, under the assumptions of the model and a difference of 4 with no heterogeneity, you’d probably want a sample of at least 700. If those assumptions aren’t reasonable, or you need additional covariates, then you can adjust the simulation.

Code

library(ggplot2)
dfout$tdiff <- dfout$z1 - dfout$x1

ggplot(data = dfout, aes(tdiff)) + geom_histogram(binwidth = .01, fill = "black") +
  facet_wrap(~ samples, nrow = 2) + labs(x = "Difference", y = "Frequency") +
            theme_bw() +
              theme(legend.position = "none", text=element_text(family="Times New Roman", face="bold", size=12))

References

Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press; Cambridge Core. https://doi.org/10.1017/CBO9780511790942

King, G., Tomz, M., & Wittenberg, J. (2000). Making the Most of Statistical Analyses: Improving Interpretation and Presentation. American Journal of Political Science, 44(2), 15.

Footnotes

Why multivariate normal? A general justification often relies on the central limit theorem under finite variance, in this example using OLS, the sampling distribution of the conditional mean parameters are known to be marginally normal and jointly multivariate normal with variance-covariance equal to \(\sigma^2 (X'X)^{-1}\)↩︎